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DISTRICT OF COLUMBIA, 

THIS PAPER IS CONCERNED WITH GRAPHIC PRESENTATION AND 
ANALYSIS OF GROUPED OBSERVATIONS. IT PRESENTS A METHOD AND 
SUPPORTING THEORY FOR THE CONSTRUCTION OF AN AREA-CONSERVING, 
MINIMAL LENGTH FREQUENCY POLYGON CORRESPONDING TO A GIVEN 
HISTOGRAM. TRADITIONALLY, THE CONCEPT OF A FREQUENCY POLYGON 
CORRESPONDING TO A GIVEN HISTOGRAM HAS REFERRED TO THAT 
POLYGON FORMED BY CONNECTING THE MIDPOINTS OF THE TOPS OF THE 
RECTANGLES MAKING UP THE HISTOGRAM. THE MOST IMPORTANT 
DEFICIENCY IN THE TRADITIONAL FREQUENCY POLYGON IS THAT THE 
AREA OF ANY SPECIFIC RECTANGLE IN THE UNDERLYING HISTOGRAM IS 
GENERALLY NOT EQUAL TO THE AREA UNDER THE FREQUENCY POLYGON 
OVER THE SAME INTERVAL. DUE TO THIS DEFICIENCY, DATA ARE 
SELDOM PRESENTED IN THE FORM OF A FREQUENCY POLYGON. (HW) 
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This paper is concerned with the problem of the 
graphic presentation and analysis of grouped obser- 



vations. Suppose a set of 1.1 observations has been 
classified on the basis of n contiguous and non- 
overlapping intervals, jL.je. , each observation falls 
into exactly one .o£„.the intervals, and then is ihenti- 

• . i- 

fied by that interval. Observations so classified 
and identified are said to be grouped. The resulting 

, i 

recorded data is then in the form of a set of n+1 
interval boundaries and a set of n integers (hV, 

• • • N^), where indicates .the number of 
observations in tho i th interval and 2lf.=H. 

Such observations are often graphically presented 
in a histogram consisting* of n rectangles (contiguous 
and nonoverlapping) with the widths (w, ) of the 

JL 

rectangles proportional to the lengths of the grouping 
intervals and the heights (h^) of the rectangles 
proportional to N s /w. . The products h . w. are conse- 
quently proportional to the corresponding percentage 

of observations in’ the interval, i.e., there 

# • 

exists a constant c such that ch s w.=N,/M. The constant 

1 ju 3/ 

o is generally incorporated into the scale used for 
drawing the histogram and will be omitted henceforth'. 
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If the last- (n til ) interval is not finite the proport:. 

in it is not represented in the histogram. In tnis 

n-1 

case 2 h . w . =1- 2t otherwise 2j h . v/.. =1 . 

Traditionally the concept of a frequency polygon 

’(PP) corresponding to a given histogram has meant 

that polygon formed by connecting the midpoints of 

• #%»•( 

the tops of the ' rd*c%angles making up the histogram. 
(Ref. Hald 49-51, Dixon & Kassey 8-9) G?he most 
important deficiency in the traditional frequency 
polygon is that the area of any specific rectangle 

in the underlying histogram is generally, not oqual to 

the area under the frequency polygon over the seme 
interval. Due tb this deficiency data are seldom 
presented in the form of a- frequency polygon. The 



purpose of this paper is 
supporting theory for tin 



to present a method and 
. construction of an area- 
conserving, minimal length frequency polygon corres- 
ponding to a given histogram. 

# * 

Por purposes of this paper histograms and fire-* 
quenoy polygons will be considered to have the 
following four parts (see figure 1): 










» r • »* « * 

2)w i upon which the histogram or F? is constructed; 

2) & 3) the two sides, which rise vertically 

from both ends of the base line; 

4 ) the upper outline (UO) which comprises l.he 
remainder of the histogram or FP, a function censis-oing 



of connected line segments whichoonnecta the two sides 



*- 



across the top. 

The base line will be assumed to be that portion or 



an axis of abcissas from zero to 2w^. 



In terms of this nomenclature, the method this 
paper presents is that of the construction of an FP 
corresponding to a given, n- interval histogram such 

• <<*’•. . ■ • *. ■ 1 • 

■ • * ! • < ] • v. , , ? i* _ * . . 

that ^ 

1) letting z. = .2 w. , ; 

1 k=l ^ 



•* iV 







PPOO dx » k ~ k 



• a • 



n 



(or 1-1, . . . n-1 if the last interval is not finite); 
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2) the FPUO is the minimal length FPJO consis 
of 2n connected line segments such that there are 



We will first show which two- 
segmented UQ's conserve the area, 
of a single rectangle (see Figure 
2)# V/e are given* the rectangle 
ABC3), its sides being segments of 
vertical lines L^and Lg* An UO 
is to be constructed consisting 
of the two line segments 2P and 
PF with 23 on distance q 



below B and with P on Lg a dis- 




tance r above c. Xn order to conserve the area oi 
ABCD we restrict, q and r as follows: q<£AB, r> -CD. 
For such an arbitrary point (q,r) in the q-r plane 
we seek the locus of points P between L-j. ^2 
such that the area of the pentagon AEFFD is the same 

t . , 

as that of the rectangle ABCD# ; 
looate the point 0 which is the 
midpoint of the line segment 
BC (Figure 3) # Extend the line 
FO to intersect at E* and 



the line EO to intersect Lp a* 



P 1 • Draw the line E*?* • 




Figure 3 
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S 1 ? 1 is the set of points such that for an 



■y 



Proof 




Draw in the auxilliary lines 
'2? and GH where GH is parallel to 
2'P* and passes through 0 (figure 
4). The area of "the trapezoid 
AGHD is the same as the area of 
ABCD since AkGOB = A COH« Trape— 
zoid AGHD ** trapezoid AEFD * 
parallelogram EGHF and parallelo- 
gram EGHF is ^parallelogram 
EE'F'F# It remains to show thai any triangle EP*? 
with P» on E'F 1 hsis half *of the area of parallelogram 
EE*p*p. Draw PP* parallel to 
PF* (figure 5)« Then A EPP ,=S 
£ parallelogram EE^P* and 
A FPP* = i parallelogram. EE" P*P* 

A EPP= AEPP * + A PPP* a=*Jr parallelogram 
2E«p*p # Q.E.D. 



Figure 4 
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Pigure 5 



We also need to know which P minimizes the sum 
of the lengths of line segment s EP+PP. 
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Theorem II 

Given two points E scud 
P and a line L parallel to 
the line through E and P, 
the point P on 1 which 
’minimizes EP + PP is at 



Pigure 6 

the" intersection of 1 and the. perpendicular bisector 
of the segment EP’X’figixre 6). 

Proof 



Let 0 be a point between E and P and cl the length 
of segment EP. Then for some q» O^q^l, SO=qA 
and PO=(l-q)d. I f jO«c, then • 

EP + PP = Jq 2 d 2 +c 2 + J {l-q.) 2 a 2 +c u . 



The derivative of this with respect to q is 



. 2qa' 



2( l-o )d‘ 






. 2 = 2. .2 



q^aS-c 

This is made equal to zero by letting <Hb ®bich 
m eans that EP+PP is minimized when EO=K), that is, 
when PO is the perpendicular bisector of EP. 
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Construct (figure 8) 2*H 

perpendicular to 2?, the 

length of 2*H=a. 

22' =B2-FC=q-r. 

Q is the midpoint of 2? 

QQ* is perpendicular to BE 

and QQ 1 = *£w. Thus since 

•• <#%•%< 

A 22 ’H is similar tb ‘A. 3QQ' 



w 



*5 : b as a : q-r- 



2 
or 



w 

20 



ba 

q-r 




or 
a 



F160RS 8 



a v^ri 



2 w 2 (q-rV 
1 " 4 



P 2 

w *i*( q+r ) 



Thus 
( 2 ?)^ = a ' Vb * = 



2 _ q 2 .v 2 _ v/ 2 (q-r ) 2 



P P 

w f(q+r) 



•Jr V/ 2 •$* ( 0 H-r ^ 

" 4 



2 



Q.IS.D. 



V\ ^ 



For future reference let us identify this U0‘ length 

* 

thus; 

g(cL,r,w) = ^ 2 P> 



ft - -' 2 
2 • -2 



/ 4w \ + vt 2 +' (q.+r) 2 
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In review, Theorems I and II show how m co find 
the minimal length area-conserving 2- segment UO for 
a given rectangle of width w and given values of q 
and r* Theorem III establishes the length of this 
UO* If q and w are given and we seek that r which 
gives us the minimal length UO we find the derivative s 



3 g(q»r,w) a 1_ 
9 r 2g' 



gw£orSl - M2±ElLs.r£ll * .2(,ii-r) 



‘•'w+(q+r) 2 ^vy+(q+r) 2 J 



Consider w as fixed and the above derivative to be 
a surface with reference to the q— r plane., Consider 
the points (the function) on the q-r plane at which 
this derivative surface passes through the .plane 3 



i .e., consider the points 



at which the surface is 
zero. A comput’er program was written which has shown 
this function to be monotbnic increasing for q in 
the range of our application. A computer subroutine, 
named UZBEK, using an iterative procedure (internal 
halving) has been written to find the required r for 
a given q,w(and an r-minimum to prevent the UO from 
passing below the base line of the rectangle). 

V/e now consider , the problem of constructing the 

/ i 

- * , 

minimal length, area-conserving 4-segmented UO over 

a given 2-rectangle histogram with only the sides of 
the PP given. ; let q^ be the left side of the -v , 
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Figure 9 



hi s t ogrsm minus the left side 
of the F? and q. 2 oe taQ 
side of the FP minus the righu 
side of the histogram ( see 
figure 9). Let the difference 
between the height of the right 
rectangle and the left rect- 
angle be r 1 +r 2 =d. • We -now 
seek the pair of numbersCr^Pg) such thav 
+ g(q 2f r 2 ,w 2 ) again consider «ne 

derivative surface Qg/9r over the q-r plane for a 

fixed w • Let f^( r ) - 3 r an ^ ~ 

9g(<l2* x,w 2^^ r# Tliese are the intersections of the 

planes q=q 1 and q=q 2 with the w 1 type derivative 

surface and the w 2 type respectively# Kotice thau. 

since r^d-rg, a change in r^ produces a change in r 2 

which differs from that of r^ only in sign# 2}hus, v/nen 

the point marking the division of d into r^ and r 2 is 

moved along the* left cSide of the right rectangle the rate 

of change of the length of the left portion of the FPUO 

is opposite in sign to that of the right portion# Me 

want the overall derivative, or the sum of these two 

derivatives, to he zero. This is accomplished when 
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f ( r ) = f ( r 0 ) and r- •«* r ? » d. This is essenwit..i..*.y 
a lagrangian multiplier problem but requires nuaer i^- 
methods. Another computer subroutine * namea ORYx., 
has been written to provide the minimal length, area- 
conserving 4-segment ed TPUO given q-^o^d, v/^, and, v/g. 

V/e are now ready to describe a procedux*e for 
constructing an TP for the general histogram made of 
n rectangles. In "the histogram UO there are n-i 
vertical line segments which v/e will call “risers 51 . 

The midpoints of the risers are used for the irrsc 
stage estimates of the height of the requix^ed jjP «••*» 
the interval boundaries. The first such riser mid- 
point is used to obtain an estimate of the height of 
the left side of the TP (using UZBEK). This lei- 
side point, along with th,e midpoint of the second riser, 
is used to obtain a new pxLnt on the first riser 
(using ORYX). Then the new first riser point and the 
third riser midpoint are used to obtain a new second 
riser point. This method is carried out across the 
histogram until only the last riser midpoint remains 
unchanged. Then (using UZBEK) the last riser midpoint 
is used to estimate the right side. Finally the 
right side point and the new point on the next uo the 
last riser are used to obtain a new point on the last- 
riser. This procedure is 'repeated across the histogram 
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several times until the point with tne greases*, 
change in its position on its riser is less than some 
prespecified number. The change tested is expresses 
as the fraction of the riser traversed during the 
pass. This pj^oc edur e brings the maximuja change 
under .0001 in about n passes* 

'*• jo complete the required frequency polygon, wo 
need the points (Si, .within the intervals which are 

• • iV •** 

derived from the points on the risers, let the origin 
• for derivation of a given P be the top left corner oi 
the histogram rectangle for that interval. (See 
figure 10) The point P 
is at the intersection of 
1) the line through (0,-r) 



with slope £-2 and 



2) the line through 



with slope — 



w 



q+r 

Solving the resulting 

simultaneous equations in x 

and y we obtain- 
-2 _2 



h_ w w(r — q ) and 
2 (q+r)+v/ 



(n-r^ Cw^-(q^rjL3 
^ 2 [w 1 +(q+r)\] 
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2his completes Part I of the outline oi t u& j.l,::±o 
and method of constructing the required frequency .pollen, 



An important characteristic of such PPs is tno.t their 
shapes are not invariant when the width (v^) scale is 
changed. Since this scale is essentially arbitrary (we 
can measure lengths in inches or meters or furlongs, etc.), 
we need a criterion vvhiph' estahlishes a standard scale 
for a given set of grouped data. r lhis topic will oc 
taken up in the next part. Other topics to oe considered 



ares 



1) uses of the minimal length area-conserving fre- 
quency polygon, such ass 

a) ' comparison of hypothesized theoretical distribu- 

tions and corresponding actual ooservauions 

b) interpolation of percentiles deriveci from grouped 

data , 

2) sample graphs. of frequency polygon®. 
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